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Abstract. The binary heap of Williams (1964) is a simple priority queue 
characterized by only storing an array containing the elements and the 
number of elements n - here denoted a strictly implicit priority queue. 
We introduce two new strictly implicit priority queues. The first struc¬ 
ture supports amortized 0(1) time Insert and O(logn) time Extract- 
Min operations, where both operations require amortized 0(1) element 
moves. No previous implicit heap with 0(1) time Insert supports both 
operations with 0(1) moves. The second structure supports worst-case 
0(1) time Insert and O(logn) time (and moves) ExtractMin opera¬ 
tions. Previous results were either amortized or needed O(logn) bits of 
additional state information between operations. 


1 Introduction 

In 1964 Williams presented “Algorithm 232” m, commonly known as the binary 
heap. The binary heap is a priority queue data structure storing a dynamic set 
of n elements from a totally ordered universe, supporting the insertion of an 
element (Insert) and the deletion of the minimum element (ExtractMin) in 
worst-case O(logn) time. The binary heap structure is an implicit data structure, 
i.e., it consists of an array of length n storing the elements, and no information is 
stored between operations except for the array and the value n. Sometimes data 
structures storing 0(1) additional words are also called implicit. In this paper 
we restrict our attention to strictly implicit priority queues, i.e., data structures 
that do not store any additional information than the array of elements and the 
value n between operations. 

Due to the L2(nlogn) lower bound on comparison based sorting, either In¬ 
sert or ExtractMin must take L?(logn) time, but not necessarily both. Carl¬ 
son et al. [4] presented an implicit priority queue with worst-case 0(1) and 
0(log?r) time Insert and ExtractMin operations, respectively. However, the 
structure is not strictly implicit since it needs to store 0(1) additional words. 
Harvey and Zatloukal m presented a strictly implicit priority structure achiev¬ 
ing the same bounds, but amortized. No previous strictly implicit priority queue 
with matching worst-case time bounds is known. 

* Work supported in part by the Danish National Research Foundation grant DNRF84 
through the Center for Massive Data Algorithmics. 



Table 1. Selected previous and new results for implicit priority queues. The bounds 
are asymptotic, and * are amortized bounds. 
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A measurement often studied in implicit data structures and in-place al¬ 
gorithms is the number of element moves performed during the execution of a 
procedure. Francessini showed how to sort n elements implicitly using 0(n log n) 
comparisons and 0(n) moves [7], and Franceschini and Munro [5] presented im¬ 
plicit dictionaries with amortized O(logn) time updates with amortized 0(1) 
moves per update. The latter immediately implies an implicit priority queue 
with amortized O(logn) time Insert and ExtractMin operations performing 
amortized 0(1) moves per operation. No previous implicit priority queue with 
0(1) time Insert operations achieving 0(1) moves per operation is known. 

For a more thorough survey of previous priority queue results, see pQ. 

Our Contribution We present two strictly implicit priority queues. The first 
structure (Section [2| limits the number of moves to 0(1) per operation with 
amortized 0(1) and O(logn) time Insert and ExtractMin operations, respec¬ 
tively. However the bounds are all amortized and it remains an open problem to 
achieve these bounds in the worst case for strictly implicit priority queues. We 
note that this structure implies a different way of sorting in-place with 0(n log n) 
comparisons and O(n) moves. The second structure (Section [3| improves over 
BE] by achieving Insert and ExtractMin operations with worst-case 0(1) 
and O(logn) time (and moves), respectively. The structure in Section [3] assumes 
all elements to be distinct where as the structure in Section [2] also can be ex¬ 
tended to support identical elements (see the appendix). See Figure [l] for a 
comparison of new and previous results. 

Preliminaries We assume the strictly implicit model as defined in [3] where we 
are only allowed to store the number of elements n and an array containing 
the n elements. Comparisons are the only allowed operations on the elements. 
The number n is stored in a memory cell with (9(logn) bits (word size) and 
any operation usually found in a RAM is allowed for computations on n and 
intermediate values. The number of moves is the number of writes to the array 
storing the elements. That is, swapping two elements costs two moves. 

A fundamental technique in the implicit model is to encode a 0/1-bit with a 
pair of distinct elements ( x , y), where the pair encodes 1 if x < y and 0 otherwise. 
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A binary heap is a complete binary tree structure where each node stores an 
element and the tree satisfies heap order , i.e., the element at a non-root node 
is larger than or equal to the element at the parent node. Binary heaps can be 
generalized to d-ary heaps da, where the degree of each node is d rather than 
two. This implies 0(log d n) and 0(dlog d n ) time for Insert and ExtractMin, 
respectively, using 0(\og d n) moves for both operations. 


2 Amortized 0(1) moves 


In this section we describe a strictly implicit priority queue supporting amortized 
0(1) time Insert and amortized O(logn) time ExtractMin. Both operations 
perform amortized 0(1) moves. In Sections 2.1|2.3 we assume elements are dis¬ 
tinct. In Appendix [A] we describe how to handle identical elements. 


Overview The basic idea of our priority queue is the following (the details are 
presented in Section 2.1). The structure consists of four components: an insertion 
buffer B of size O^ojpn); m insertion heaps 1) , / 2 ,.... I rn each of size 0(log 3 n ), 
where m = 0(nj log 3 n); a singles structure T, of size O(n); and a binary heap Q , 
storing { 1 , 2 ,..., m} (integers encoded by pairs of elements) with the ordering 
i < j if and only if min I, < min Ij. Each and B is a log n-ary heap of size 
0(log 3 n). The table below summarizes the performance of each component: 


Insert ExtractMin 

Structure Time Moves Time Moves 
B , Ii 1 1 logn 1 

Q log 2 n log 2 n log 2 n log 2 n 

T log n 1 log n 1 

It should be noted that the implicit dictionary of Franceschini and Munro [5] 
could be used for T, but we will give a more direct solution since we only need 
the restricted ExtractMin operation for deletions. 

The Insert operation inserts new elements into B. If the size of B becomes 
0(log 3 n), then m is incremented by one, B becomes I m , m is inserted into Q, 
and B becomes a new empty log n-ary heap. An ExtractMin operation first 
identifies the minimum element in B, Q and T. If the overall minimum element e 
is in B or T, e is removed from B or T. If the minimum element e resided in Ii, 
where i is stored at the root of Q , then e and log 2 n further smallest elements 
are extracted from Ii (if Ii is not empty) and all except e inserted into T (T 
has cheap operations whereas Q does not, thus the expensive operation on Q is 
amortized over inexpensive ones in T), and i is deleted from and reinserted into 
Q with respect to the new minimum element in /,;. Finally e is returned. 

For the analysis we see that Insert takes 0(1) time and moves, except 
when converting B to a new I m and inserting m into Q. The 0(log 2 n) time 
and moves for this conversion is amortized over the insertions into B , which 
becomes amortized 0(1), since \B\ = i?(log 2 n). For ExtractMin we observe 
that an expensive deletion from Q only happens once for every log 2 n-th element 
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from Ii (the remaining ones from Q are moved to T and deleted from T), and 
finally if there have been d ExtractMin operations, then at most d + m log 2 n 
elements have been inserted into T, with a total cost of 0((d + m log 2 n) log n) = 
0{n + dlogn), since m = 0(nj log 3 n). 


2.1 The implicit structure 
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Fig. 1. The different structures and their layout in memory. 


We now give the details of our representation (see Figure [I]). We select one 
element et as our threshold element, and denote elements greater than e* as 
dummy elements. The current number of elements in the priority queue is de¬ 
noted n. We fix an integer N that is an approximation of n, where N < n < AN 
and N = 2- 7 for some j. Instead of storing N, we store a bit r = [log n\ — log N, 
encoded by two dummy elements. We can then compute TV as N = 2^° sn ^ r , 
where [log n\ is the position of the most significant bit in the binary repre¬ 
sentation of n (which we assume is computable in constant time). The value 
r is easily maintained: When [log rzj changes, r changes accordingly. We let 
A = log(4/V) = [lognj + 2 — r, i.e., A bits is sufficient to store an integer in the 
range 0 ..n. We let M = \AN/A 3 ~\. 

We maintain the invariant that the size of the insertion buffer B satisfies 
1 < \B\ < 2A 3 , and that B is split into two parts B\ and B 2 , each being Z\-ary 
heaps {B 2 possibly empty), where | B\ \ = min{|B|, zl 3 } and \B 2 \ = \B\ — |T?i|. We 
use two buffers to prevent expensive operation sequences that alternate inserting 
and deleting the same element. We store a bit b indicating if B 2 is nonempty, i.e., 
b = 1 if and only if \B 2 \ 7 ^ 0. The bit b is encoded using two dummy elements. 
The structures / 1 , 1 2 ,..., I m are A-aiy heaps storing A 3 elements. The binary 
heap Q is stored using two arrays Qh and Q re v each of a fixed size M > m and 
storing integers in the range l..m. Each value in both arrays is encoded using 
2Z\ dummy elements, i.e., Q is stored using AM A dummy elements. The first 
m entries of Qh store the binary heap, whereas Q re v acts as reverse pointers, 
i.e., if Qh[j\ = i then Q rev [i] — j. All operations on a regular binary heap take 
O(logn) time, but since each “read”/’’write” from/to Q needs to decode/encode 
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an integer the time increases by a factor 2A. It follows that Q supports Insert 
and ExtractMin in 0(\og 2 n) time, and FindMin in 0(log?r) time. 

We now describe T and we need the following density maintenance result. 

Lemma 1 ([2]). There is a dynamic data structure storing n comparable ele¬ 
ments in an array of length (1 + e)n, supporting Insert and ExtractMin in 
amortized 0(log 2 n) time and FindPredecessor in worst case 0(logn) time. 
FindPredecessor does not modify the array. 

Corollary 1. There is an implicit data structure storing n (key, index ) pairs, 
while supporting Insert and ExtractMin in amortized 0(log 3 n ) time and 
moves, and FindPredecessor in O(logn) time in an array of length A(2+e)n. 

Proof. We use the structure from Lemma [l] to store pairs of a key and an index, 
where the index is encoded using 2Z\ dummy elements. All additional space 
is filled with dummy elements. However comparisons are only made on keys 
and not indexes, which means we retain O(logn) time for FindMin. Since the 
stored elements are now an O(A) = 6 >(logn) factor larger, the time for update 
operations becomes an O(logn) factor slower giving amortized 0 (log 3 n) time 
for Insert and ExtractMin. □ 

The singles structure T intuitively consists of a sorted list of the elements 
stored in T partitioned into buckets D\, ..., D q of size at most A 3 , where the 
minimum element e from bucket Di is stored in a structure S from Corollary [l] 
as the pair (e, i). Each D, is stored as a Z\-ary heap of size A 3 , where empty slots 
are filled with dummy elements. Recall implicit heaps are complete trees, which 
means all dummy elements in Di are stored consecutively after the last non¬ 
dummy element. In S we consider pairs (e, i) where e > e* to be empty spaces. 

More specifically, the structure T consists of: q, S , D\, D 2 , ■. ., Dk , where 
K = \ 16 ^ 3 ] > q is the number of Df s available. The structure S uses [ 432 ] 
elements and q uses 2Z\ elements to encode a pointer. Each Di uses A 3 elements. 

The Df s and S relate as follows. The number of Dfs is at most the maximum 
number of items that can be stored in S. Let (e, 1 ) £ S, then \/x £ Di : e < x, 
and furthermore for any (e! , i!) £ S with e < e! we have Wx £ Di : x < e! . These 
invariants do not apply to dummy elements. Since Di is a Z\-ary heap with Z \ 3 
elements we get C^log^ Z\ 3 ) = 0(1) time for Insert and 0(A\og A A 3 ) = O(A) 
for ExtractMin on a A- 

2.2 Operations 

For both Insert and ExtractMin we need to know N, A, and whether there 
are one or two insert buffers as well as their sizes. First r is decoded and we 
compute A = 2 + msb(n) — r, where msb(n) is the position of the most significant 
bit in the binary representation of n (indexed from zero). From this we compute 
N = 2 a ~ 2 , I< = \N/(16A 3 )], and M = |"4iV/Z\ 3 l. By decoding b we get the 
number of insert buffers. To find the sizes of B 1 and B 2 we compute the value 
i s tart which is the index of the first element in I\. The size of B\ is computed 
as follows. If (n — istart) mod A 3 = 0 then |£?i| = A 3 . If B 2 exists then B\ 
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starts at n — 2Z\ 3 and otherwise B\ starts at n — A 3 . If B 2 exists and (n — 
istart) mod Zi 3 = 0 then \B 2 \ = A 3 , otherwise \B 2 \ = (n — istart ) mod A 3 . Once 
all of this information is computed the actual operation can start. If n = N + 1 
and an ExtractMin operation is called, then the ExtractMin procedure is 
executed and afterwards the structure is rebuilt as described in the paragraph 
below. Similarly if n = 4IV — 1 before an Insert operation the new element is 
appended and the data structure is rebuilt. 

Insert If | B { < A 3 the new element is inserted in B\ by the standard insertion 
algorithm for Z\-ary heaps. If \B\\ = A 3 and \B 2 \ = 0 and a new element is 
inserted the two elements in b are swapped to indicate that B 2 now exists. 
When |Bi| = \B 2 \ = A 3 and a new element is inserted, B\ becomes I m +i, B 2 
becomes B±, m + 1 is inserted in Q (possibly requiring 0(log n) values in Qh and 
Qrev to be updated in 0(log 2 n) time). Finally the new element becomes B 2 . 

ExtractMin Searches for the minimum element e are performed in B\ : B 2 , S, 
and Q. If e is in B\ or B 2 it is deleted, the last element in the array is swapped 
with the now empty slot and the usual bubbling for heaps is performed. If B 2 
disappears as a result, the bit b is updated accordingly. If Bi disappears as a 
result, I m becomes B\, and m is removed from Q. 

If e is in B then i is deleted from Q , e is extracted from /*, and the last element 
in the array is inserted in B- The A 2 smallest elements in B are extracted and 
inserted into the singles structure : for each element a search in S is performed 
to find the range it belongs to, i.e. Dj, the structure it is to be inserted in. 
Then it is inserted in Dj (replacing a dummy element that is put in B , found by 
binary search). If \Dj\ = A 3 and q = K the priority queue is rebuilt. Otherwise 
if l-Dj| = A 3 , D 3 is split in two by finding the median y of Dj using a linear 
time selection algorithm [5]. Elements > y in Dj are swapped with the first 
A 3 /2 elements in D q then Dj and D q are made into Zi-ary heaps by repeated 
insertion. Then y is extracted from D q and ( y , q) is inserted in S. The dummy 
element pushed out of S by y is inserted in D q . Finally q is incremented and we 
reinsert i into Q. Note that it does not matter if any of the elements in B are 
dummy elements, the invariants are still maintained. 

If (e, i) € S, the last element of the array is inserted into the singles structure, 
which pushes out a dummy element z. The minimum element y of Di is extracted 
and z inserted instead. We replace e by y in S. If y is a dummy element, we 
update S as if (y, i) was removed. Finally e is returned. Note this might make 
B 1 or B 2 disappear as a result and the steps above are executed if needed. 

Rebuilding We let the new N = n'/ 2, where n! is n rounded to the nearest power 
of two. Using a linear time selection algorithm :5], find the element with rank 
n — istart 1 this element is the new threshold element e*, and it is put in the first 
position of the array. Following e t are all the elements greater than e t and they 
are followed by all the elements comparing less than et- We make sure to have 
at least A 3 /2 elements in B\ and at most A 3 /2 elements in B 2 which dictates 
whether b encodes 0 or 1. The value q is initialized to 1. All the Di structures 
are considered empty since they only contain dummy elements. The pointers in 
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Qh and Q rev are all reset to the value 0 . All the /,; structures as well as B\ 
(and possibly B- 2 ) are made into Z\-ary heaps with the usual heap construction 
algorithm. For each Ij structure the A 2 smallest elements are inserted in the 
singles structure as described in the ExtractMin procedure, and j is inserted 
into Q. The structure now satisfies all the invariants. 

2.3 Analysis 

In this subsection we give the analysis that leads to the following theorem. 

Theorem 1. There is a strictly implicit priority queue supporting Insert in 
amortized O (1) time, ExtractMin in amortized O (log n) time. Both operations 
perform amortized 0(1) moves. 

Insert While \B\ < 2 A 3 , each insertion takes 0(1) time. When an insertion 
happens and \B\ = 2 A 3 , the insertion into Q requires 0(log 2 n) time and moves. 
During a sequence of s insertions, this can at most happen |"s/Z\ 3 ] times, since 
\B\ can only increase for values above A 3 by insertions, and each insertion at 
most causes \B\ to increase by one. The total cost for s insertions is 0(s + s/A 3 • 
log 2 n) = O(s), i.e., amortized constant per insertion. 

ExtractMin We first analyze the cost of updating the singles structure. Each 
operation on a Di takes time O(A) and performs 0(1) moves. Locating an ap¬ 
propriate bucket using S takes O(logn) time and no moves. At least ft{A 3 ) 
operations must be performed on a bucket to trigger an expensive bucket split 
or bucket elimination in S. Since updating S takes 0(log 3 n) time, the amor¬ 
tized cost for updating S is 0(1) moves per insertion and extraction from the 
singles structure. In total the operations on the singles structure require amor¬ 
tized O(logn) times and amortized 0(1) moves. For ExtractMin the searches 
performed all take 0(log?r) comparisons and no moves. If B\ disappears as a 
result of an extraction we know at least ft {A 3 ) extractions have occurred be¬ 
cause a rebuild ensures |£>i| > A 3 /2. These extractions pay for extracting I m 
from Qh which takes 0 (log 2 n) time and moves, amortized this gives 0 ( 1 / log n) 
additional time and moves. If the extracted element was in for some i, then 
A 2 insertions occur in the singles structure each taking O(logn) time and 0(1) 
moves amortized. If that happens either ft(A 3 ) insertions or A 2 extractions have 
occurred: Suppose no elements from 1, have been inserted in the singles struc¬ 
ture, then the reason there is a pointer to /,; in Qh is due to ft(A 3 ) insertions. 
When inserting elements in the singles structure from /,; the number of elements 
inserted is A 2 and these must first be deleted. From this discussion it is evident 
that we have saved up ft(A 2 ) moves and ft(A 3 ) time, which pay for the expen¬ 
sive extraction. Finally if the minimum element was in S , then an extraction 
on a Z\-ary heap is performed which takes O(A) time and 0(1) moves, since its 
height is 0 ( 1 ). 

Rebuilding The cost of rebuilding is O(n), due to a selection and building heaps 
with 0(1) height. There are three reasons a rebuild might occur: (i) n became 
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41V, (ii) n became N — 1, or (iii) An insertion into T would cause q > K. By the 
choice of N during a rebuild it is guaranteed that in the first and second case at 
least 12 (N) insertions or extractions occurred since the last rebuild, and we have 
thus saved up at least S2(N) time and moves. For the last case we know that 
each extraction incur 0(1) insertions in the singles structure in an amortized 
sense. Since the singles structure accommodates 12 (N) elements and a rebuild 
ensures the singles structure has o(n) non dummy elements (Lemma [2]), at least 
12 (N) extractions have occurred which pay for the rebuild. 

Lemma 2. Immediately after a rebuild o{n) elements in the singles structure 
are non-dummy elements 

Proof. There are at most n/A 3 of the A structures and A 2 elements are inserted 
in the singles structure from each thus at most n/A = o(n) non-dummy 
elements reside in the singles structure after a rebuild. □ 

The paragraphs above establish Theorem [l] 

3 Worst case solution 

In this section we present a strictly implicit priority queue supporting Insert 
in worst-case 0(1) time and ExtractMin in worst-case O(logn) time (and 
moves). The data structure requires all elements to be distinct. The main concept 
used is a variation on binomial trees. The priority queue is a forest of O(logn) 
such trees. We start with a discussion of the variant we call relaxed binomial 
trees , then we describe how to maintain a forest of these trees in an amortized 
sense, and finally we give the deamortization. 

3.1 Relaxed binomial tree 

Binomial trees are defined inductively: A single node is a binomial tree of size 
one and the node is also the root. A binomial tree of size 2 i+1 is made by linking 
two binomial trees T\ and T 2 both of size 2 1 , such that one root becomes the 
rightmost child of the other root. We lay out in memory a binomial tree of size 2* 
by a preorder traversal of the tree where children are visited in order of increasing 
size, i.e. co,ci,..., Cj_i. This layout is also described in [I]. See Figure [2] for an 
illustration of the layout. In a relaxed binomial tree (RBT) each nodes stores an 
element, satisfying the following order: Let p be a node with i children, and let 
Cj be a child of p. Let T Cj denote the set of elements in the subtree rooted at Cj. 
We have the invariant that the element ce is less than either all elements in T Cl 
or less than all elements in (J j<aT Cj (see Figure [2]). In particular we have the 
requirement that the root must store the smallest element in the tree. In each 
node we store a flag indicating in which direction the ordering is satisfied. Note 
that linking two adjacent RBTs of equal size can be done in 0(1) time: compare 
the keys of the two roots, if the lesser is to the right, swap the two nodes and 
finally update the flags to reflect the changes as just described. 

For an unrelated technical purpose we also need to store whether a node is 
the root of a RBT. This information is encoded using three elements per node 


(allowing 3! = 6 permutations, and we only need to differentiate between three 
states per node: “root”, “minimum of its own subtree”, or “minimum among 
strictly smaller subtrees”). 



Fig. 2. An example of an RBT on 16 elements (a,b,...,o). The layout in memory of an 
RBT and a regular binomial tree is the same. Note here that node 9 has element c and 
is not the minimum of its subtree because node 11 has element 6, but c is the minimum 
among the subtrees rooted at nodes 2, 3, and 5 (co, Ci, and C2). Note also that node 5 
is the minimum of its subtree but not the minimum among the trees rooted at nodes 
2 and 3, which means only one state is valid. Finally node 3 is the minimum of both 
its own subtree and the subtree rooted at node 2, which means both states are valid 
for that node. 


To extract the minimum element of an RBT it is replaced by another ele¬ 
ment. The reason for replacing is that the forest of RBTs is implicitly maintained 
in an array and elements are removed from the right end, meaning only an el¬ 
ement from the last RBT is removed. If the last RBT is of size 1, it is trivial 
to remove the element. If it is larger, then we decompose it. We first describe 
how to perform a Decompose operation which changes an RBT of size 2* into 
i structures X)_i,... ,Ti,To, where \Tj\ = 2?. Then we describe how to per¬ 
form ReplaceMin which takes one argument, a new element, and extracts the 
minimum element from an RBT and inserts the argument in the same structure. 

A DECOMPOSE procedure is essentially reversing insertions. We describe a 
tail recursive procedure taking as argument a node r. If the structure is of size 
one, we are done. If the structure is of size 2 l the (i — l)th child, c,_ 1 , of r is 
inspected, if it is not the minimum of its own subtree, the element of c^_ 1 and 
r are swapped. The (* — l)th child should now encode “root”, that way we have 
two trees of size 2 l ~ 1 and we recurse on the subtree to the right in the memory 
layout. This procedure terminates in 0(i) steps and gives i +1 structures of sizes 
2 * _1 , 2 * -2 ,..., 2 , 1 , and 1 laid out in decreasing order of size (note there are two 
structures of size 1). This enables easy removal of a single element. 

The ReplaceMin operation works similarly to the Decompose, where in¬ 
stead of always recursing on the right, we recurse where the minimum element is 
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the root. When the recursion ends, the minimum element is now in a structure 
of size 1, which is deleted and replaced by the new element. The decomposition 
is then reversed by linking the RBTs using the Link procedure. Note it is pos¬ 
sible to keep track of which side was recursed on at every level with O(logn) 
extra bits, i.e. 0(1) words. The operation takes O(logrc) steps and correctness 
follows by the Decompose and Link procedures. This concludes the description 
of RBTs and yields the following theorem. 

Theorem 2. On an RBT with 3 • 2* elements, Link and FindMin can be sup¬ 
ported in 0(1) time and Decompose and ReplaceMin in 0(i ) time. 

3.2 How to maintain a forest 

As mentioned our priority queue is a forest of the relaxed binomial trees from 
Theorem [ 2 ] An easy amortized solution is to store one structure of size 3 • 2 J 
for every set bit j in the binary representation of |_n/3_|- During an insertion 
this could cause O(logir) Link operations, but by a similar argument to that of 
binary counting, this yields 0(1) amortized insertion time. We are aiming for a 
worst case constant time solution so we maintain the invariant that there are at 
most 5 structures of size 2* for * = 0,1,..., [log n\. This enables us to postpone 
some of the Link operations to appropriate times. We are storing 0(log n) RBTs, 
but we do not store which sizes we have, this information must be decodable 
in constant time since we do not allow storing additional words. Recall that 
we need 3 elements per node in an RBT, thus in the following we let n be the 
number of elements and N = [_ti/ 3J be the number of nodes. We say a node is 
in node position k if the three elements in it are in positions 3k — 2, 3fc — 1, and 
3k. This means there is a buffer of 0,1, or 2 elements at the end of the array. 
When a third element is inserted, the elements in the buffer become an RBT 
with a single node and the buffer is now empty. If an Insert operation does not 
create a new node, the new element is simply appended to the buffer. We are 
not storing the structure of the forest (i.e. how many RBTs of size 2 J exists for 
each j), since that would require additional space. To be able to navigate the 
forest we need the following two lemmas. 

Lemma 3. There is a structure of size 2* at node positions k, k+ 1,..., k + 2 1 — 1 
if and only if the node at position k encodes “root”, the node at position k + 2® 
encodes “root” and the node at position k + 2 l ~ 1 encodes “not root”. 

Proof. It is trivially true that the mentioned nodes encode “root”, “root” and 
“not root” if an RBT with 2* nodes is present in those locations. 

We first observe there cannot be a structure of size 2 l ~ 1 starting at position k , 
since that would force the node at position k + 2 l ~ 1 to encode “root”. Also all 
structures between k and N must have less than 2* elements, since both nodes 
at positions k and k + 2 l encode “root”. We now break the analysis in a few 
cases and the lemma follows from a proof by contradiction. Suppose there is a 
structure of size 2 Z_2 starting at k, then for the same reason as before there 
cannot be another one of size 2 1-2 . Similarly, there can at most be one structure 
of size 2 l ~ 3 following that structure. Now we can bound the total number of 
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nodes from position k onwards in the structure as: 2* 2 + 2‘ 3 + 5 X^j=o = 
2 l — 5 < 2®, which is a contradiction. So there cannot be a structure of size 2 l ~ 2 
starting at position k. Note there can at most be three structures of size 2 1-3 
starting at position k , and we can again bound the total number of nodes as: 
3 • 2 i_3 + 5 = 2 1 — 5 < 2 l , again a contradiction. □ 

Lemma 4. If there is an RBT with 2 -1 nodes the root is in position N—2 l k—x+l 
for k = 1, 2,3,4 or 5 and x = N mod 2 ? . 

Proof. There are at most 5-2 1 —5 nodes in structures of size < 2* _1 . All structures 
of size > 2 l contribute 0 to x, thus the number of nodes in structures with < 2 1 ” 1 
nodes must be x counting modulo 2*. This gives exactly the five possibilites for 
where the first tree of size 2 l can be. □ 

We now describe how to perform an ExtractMin. First, if there is no buffer 
(n mod 3 = 0) then Decompose is executed on the smallest structure. We apply 
Lemma [4] iteratively for * = 0 to [log N\ and use Lemma [3] to find structures of 
size 2\ If there is a structure we call the FindMin procedure (i.e. inspect the 
element of the root node) and remember which structure the minimum element 
resides in. If the minimum element is in the buffer, it is deleted and the rightmost 
element is put in the empty position. If there is no buffer, we are guaranteed due 
to the first step that there is a structure with 1 node, which is now the buffer. 
On the structure with the minimum element ReplaceMin is called with the 
rightmost element of the array. The running time is O(logn) for finding all the 
structures, O(logro) for decomposing the smallest structure and O(logn) for the 
ReplaceMin procedure, in total we get O(logn) for ExtractMin. 

The Insert procedure is simpler but the correctness proof is somewhat in¬ 
volved. A new element is inserted in the buffer, if the buffer becomes a node, 
then the least significant hit i of N is computed. If at least two structures of size 
2 * exist (found using the two lemmas above), then they are linked and become 
one structure of size 2 * +1 . 

Lemma 5. The Insert and ExtractMin procedures maintain that at most 
five structures of size 2 l exist for all i < [lognj • 

Proof. Let _/V<j be the total number of nodes in structures of size < 2*. Then 
the following is an invariant for i = 0,1,..., [log IV J. 

N<i + (2 l+1 - ((N + 2 4 ) mod 2 i+1 ))) < 6 • 2 i - 1 

The invariant states that N<i plus the number of inserts until we try to link 
two trees of size 2* is at most 6 • 2 l — 1. Suppose that a new node is inserted 
and i is not the least significant bit of N then N<i increases by one and so does 
(N + 2 Z ) mod 2 l+1 , which means the invariant is maintained. Suppose that i is 
the least significant bit in N (i.e. we try to link structures of size 2 l ) and there 
are at least two structures of size 2 ®, then the insertion makes N<i decrease by 
2 • 2 l — 1 = 2 l+1 — 1 and 2 i+1 — (N + 2 1 mod 2 I+1 )) increases by 2 l+1 — 1 , since 
(.N + 2*) mod 2 Z+1 becomes zero, which means the invariant is maintained. Now 
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suppose there is at most one structure of size 2* and i is the least significant bit of 
TV. We know by the invariant that -/V < i _ 1 + (2* — (TV + 2 l ~ 1 mod 2*)) < 6 • 2 i_1 — 1 
which implies 7V < i _ 1 < 6-2 I_1 —1 —2 ? +2 * _1 = 5-2 * _1 — 1 . Since we assumed there 
is at most one structure of size 2* we get that TV<; < 2 I +TV<j_i < 2*+5-2 * _1 —1 = 
3.5 • 2* — 1. Since TV mod 2 * +1 = 2* (i is the least significant bit of TV) we have 
N<i + (2 i+1 — (TV + 2* mod 2 i+1 )) < 3.5 • 2 l - 1 + 2 i+1 = 5.5 • 2 i - 1 < 6 • 2 i - 1. 

The invariant is also maintained when deleting: for each i where TV* > 0 before 
the ExtractMin, TV,; decreases by one. For all i the second term increases by at 
most one, and possibly decreases by 2 I+1 — 1. Thus the invariant is maintained 
for all i where TVj > 0 before the procedure. If Ni = 0 before an ExtractMin, 
we get Nj = 2 J+1 — 1 for j < i. Since the second term can at most contribute 
2- 5+1 , we get Nj + {2^ +1 - ((TV + V) mod 2J+ 1 )) < 2 ^ +1 - 1 + 2 -T + 1 < 6 • 2? - 1, 
thus the invariant is maintained. □ 

Correctness and running times of the procedures have now been established. 
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A Handling identical elements in the amortized case 

The primary difficulty in handling identical elements is that we lose the ability 
to encode bits. The primary goal of this section is to do so anyway. The idea is 
to let the items stored in the priority queue be pairs of distinct elements where 
the key of an item is the lesser element in the pair. In the case where it is not 
possible to make a sufficient number of pairs of distinct elements, almost all 
elements are equal and this is an easy case to handle. Note that many pairs (or 
all for that matter) can contain the same elements, but each pair can now encode 
a bit, which is sufficient for our purposes. 

The structure is almost the same as before, however we put a few more things 
in the picture. As mentioned we need to use pairs of distinct elements, so we 
create a mechanism to produce these. Furthermore we need to do some book 
keeping such as storing a pointer and being able to compute whether there are 
enough pairs of distinct elements to actually have a meaningful structure. The 
changes to the memory layout is illustrated in Figure [3j 


One pair 



Fig. 3. The different structures and their layout in memory. 


Modifications The areas L and B' in memory are used to produce pairs of distinct 
elements. The area pl is a Gray coded pointeiQ^ with 0(logn) pairs, pointing 
to the beginning of L. The rest of the structure is essentially the same as before, 
except instead of storing elements, we now store pairs e = (ei, efi) and the key of 
the pair is ek = minjei, 62 }. All comparisons between items are thus made with 
the key of the pair. We will refer to the priority queue from Section [2] as PQ. 

There are a few minor modifications to PQ. Recall that we needed to simulate 
empty spaces inside T (specifically in S, see Figure [I]). The way we simulated 
empty spaces was by having elements that compared greater than e t . Now e t is 
actually a pair, where the minimum element is the threshold element. It might 
be the case that there are many items comparing equal to et, which means some 
would be used to simulate empty spaces and others would be actual elements in 
PQ and some would be used to encode pointers. This means we need to be able 
to differentiate these types that might all compare equal to e t . First observe that 

1 Gray, F.: Pulse code communications. U.S. Patent (2632058) (1953) 
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items used for pointers are always located in positions that are distinguishable 
from items placed in positions used as actual items. Thus we do not need to 
worry about confusing those two. Similarly, the “empty” spaces in T are also 
located in positions that are distinguishable from pointers. Now we only need to 
be able to differentiate “empty” spaces and occupied spaces where the keys both 
compare equal to e*. Letting items (i.e. pairs) used as empty spaces encode 1, 
and the “occupied” spaces encode 0, empty spaces and occupied spaces become 
differentiable as well. Encoding that bit is possible, since they are not used for 
encoding anything else. 

Since many elements could now be identical we need to decide whether there 
are enough distinct elements to have a meaningful structure. As an invariant we 
have that if the two elements in the pair e t = (e t p, e t ^) are equal then there are 
not enough elements to make l?(logn) pairs of distinct elements. The O(logn) 
elements that are different from the majority are then stored at the end of the 
array. After every log nth insertion it is easy to check if there are now sufficient 
elements to make > c log n pairs for some appropriately large and fixed c. When 
that happens, the structure in Figure [3] is formed, and e t must now contain two 
distinct elements, with the lesser being the threshold key. Note also, that while 
etp = et ,2 an ExtractMin procedure simply needs to scan the last < clogn 
elements and possibly make one swap to return the minimum and fill the empty 
index. 


Insert The structure B' is a list of single elements which functions as an insertion 
buffer, that is elements are simply appended to B' when inserted. Whenever 
n mod log n = 0a procedure making pairs is run: At this point we have time to 
decode p l, and up to 0(log?r) new pairs can be made using L and B'. To make 
pairs B' is read, all elements in B' that are equal to elements in L , are put after 
L, the rest of the elements in B' are used to create pairs using one element from 
L and one element from B'. If there are more elements in B ', they can be used to 
make pairs on their own. These pairs are then inserted into PQ. To make room 
for the newly inserted pairs, L might have to move right and we might have to 
update pl ■ Since pl is a Gray coded pointer, we only need as many bit changes 
as there are pairs inserted in PQ, ensuring 0(1) amortized moves. Note that the 
size of PQ is now the value of pl , which means all computations involving n for 
PQ should use pl instead. 


ExtractMin To extract the minimum a search for the minimum is performed in 
PQ, B' and L. If the minimum is in PQ, it is extracted and the other element in 
the pair is put at the end of B'. Now there are two empty positions before L , so 
the last two elements of L are put there, and the last two elements of B' are put 
in those positions. Note pl also needs to be decremented. If the minimum is in 
A?', it is swapped with the element at position n, and returned. If the minimum 
is in L , the last element of L is swapped with the element at position n, and it 
is returned. 
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Analysis Firstly observe that if we can prove the producing of pairs uses amor¬ 
tized 0(1) moves for Insert and ExtractMin and 0(1) and O(logn) time 
respectively, then the rest of the analysis from Section 12.31 carries through. We 
first analyze Insert and then ExtractMin. 

For Insert there are two variations: either append elements to B' or clean 
up B' and insert into PQ. Cleaning up B' and inserting into PQ is expensive 
and we amortize it over the cheap operations. Each operation that just appends 
to B' costs 0(1) time and moves. Cleaning up B' requires decoding p L , scan¬ 
ning B' and inserting O(logn) elements in PQ. Note that between two clean-ups 
either 0(log?i) elements have been inserted or there has been at least one Ex¬ 
tractMin, so we charge the time there. Since each insertion into PQ takes 
0 (1) time and moves amortized we get the same bound when performing those 
insertions. The cost of reading p L is O(logn), but since we are guaranteed that 
either l2(logn) insertions have occurred or at least one ExtractMin operation 
we can amortize the reading time. 
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